摘要 :
Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomor...
展开
Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task, which is computationally expensive, and thus they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we propose FS3, which is a sampling-based method. It mines a small collection of subgraphs that are most frequent in the probabilistic sense. FS3 performs a Markov chain Monte Carlo (MCMC) sampling over the space of a fixed-size subgraphs such that the potentially frequent subgraphs are sampled more often. Besides, FS3 is equipped with an innovative queue manager. It stores the sampled subgraph in a finite queue over the course of mining in such a manner that the top-k positions in the queue contain the most frequent subgraphs. Our experiments on the database of large graphs show that FS3 is efficient and obtains subgraphs that are the most frequent among the subgraphs of a given size. (C) 2015 Wiley Periodicals, Inc.
收起
摘要 :
Given a large graph, like a computer communication network, which nodes should we immunize (or monitor, or remove), to make it as robust as possible against a computer virus attack? This problem, referred to as the node immunizat...
展开
Given a large graph, like a computer communication network, which nodes should we immunize (or monitor, or remove), to make it as robust as possible against a computer virus attack? This problem, referred to as the node immunization problem, is the core building block in many high-impact applications, ranging from public health, cybersecurity to viral marketing. A central component in node immunization is to find the best bridges of a given graph. In this setting, we typically want to determine the relative importance of a node (or a set of nodes) within the graph, for example, how valuable (as a bridge) a person or a group of persons is in a social network. First of all, we propose a novel ‘bridging’ score , inspired by immunology, and we show that its results agree with intuition for several realistic settings. Since the straightforward way to compute is computationally intractable, we then focus on the computational issues and propose a surprisingly efficient way () to estimate it. Experimental results on real graphs show that (1) the proposed ‘bridging’ score gives mining results consistent with intuition; and (2) th- proposed fast solution is up to faster than straightforward alternatives.
收起
摘要 :
In recent years, data mining in graphs or graph mining have attracted much attention due to explosive growth in generating graph databases. The graph database is one type of database that consists of either a single large graph or...
展开
In recent years, data mining in graphs or graph mining have attracted much attention due to explosive growth in generating graph databases. The graph database is one type of database that consists of either a single large graph or a number of relatively small graphs. Some applications that produce graph database are as follows: Biological networks, semantic web and behavioral modeling. Among all patterns occurring in graph database, mining frequent subgraphs is of great importance. The frequent subgraph is the one that occurs frequently in the graph database. Frequent subgraphs not only are important themselves but also are applicable in other aspects of data analysis and data mining tasks, such as similarity search in graph database, graph clustering, classification, indexing, etc. So far, numerous algorithms have been proposed for mining frequent subgraphs. This study aims to create overall view of the algorithms through the analysis and comparison of their characterizations. To achieve the aim, the existing algorithms are classified based on their graph database and their subgraph generation way. The proposed classification can be effective in choosing applications appropriate algorithms and determination of graph mining new methods in this regard.
收起
摘要 :
We propose a novel distributed algorithm for mining frequent subgraphs from a single, very large, labeled network. Our approach is the first distributed method to mine a massive input graph that is too large to fit in the memory o...
展开
We propose a novel distributed algorithm for mining frequent subgraphs from a single, very large, labeled network. Our approach is the first distributed method to mine a massive input graph that is too large to fit in the memory of any individual compute node. The input graph thus has to be partitioned among the nodes, which can lead to potential false negatives. Furthermore, for scalable performance it is crucial to minimize the communication among the compute nodes. Our algorithm, DistGraph, ensures that there are no false negatives, and uses a set of optimizations and efficient collective communication operations to minimize information exchange. To our knowledge DistGraph is the first approach demonstrated to scale to graphs with over a billion vertices and edges. Scalability results on up to 2048 IBM Blue Gene/Q compute nodes, with 16 cores each, show very good speedup.
收起
摘要 :
Mining frequent patterns in multigraphs is a challenging task in graph analysis with numerous real-world applications. This paper introduces a novel framework for frequent pattern mining on multi-graphs using the multi-SPMiner met...
展开
Mining frequent patterns in multigraphs is a challenging task in graph analysis with numerous real-world applications. This paper introduces a novel framework for frequent pattern mining on multi-graphs using the multi-SPMiner method. The approach is inspired by SPMiner, which was the first approach to employ deep learning in graph motif mining tasks. Multi-SPMiner builds on this foundation and focuses on the extraction of frequent motifs in single multi-graphs, specifically spatiotemporal graphs. Multi-SPMiner employs a two-step approach to extract the most frequent motifs in a graph with a high support value. In the first step, it embeds the nodes into an embedding order space, and in the second step, it performs a walk in the space to obtain the frequent motifs by iteratively growing the motif starting from a single node. The results obtained highlight the effectiveness of the proposed approach in identifying frequent motifs in single multigraphs, which is a crucial task in many real-world applications. Moreover, we demonstrate that our method is a generalization of SPMiner by testing it on single connection graphs.
收起
摘要 :
Graphs traditionally have many applications in various areas of computer science. Research in graph-based data mining has recently gained a high level of attraction due to its broad range of applications. Examples include XML docu...
展开
Graphs traditionally have many applications in various areas of computer science. Research in graph-based data mining has recently gained a high level of attraction due to its broad range of applications. Examples include XML documents, web logs, web searches and molecular biology. Most of the approaches used in these applications focus on deriving interesting, frequent patterns from given datasets. Two fundamental questions are, however, ignored; that is, how to derive a graph from a set of objects and how to order nodes according to their relations with others in the graph. In this paper, we provide approaches to building a graph from a given set of objects accompanied by their feature vectors, as well as to ranking nodes in the graph. The basic idea of our ranking approach is to quantify the important role of a node as the degree to which it has direct and indirect relationships with other nodes in a graph. A method for visualising graphs with ranking nodes is also presented. The visual examples and applications are provided to demonstrate the effectiveness of our approaches.
收起
摘要 :
A graph-based approach to document classification is described in this paper. The graph representation offers the advantage that it allows for a much more expressive document encoding than the more standard bag of words/phrases ap...
展开
A graph-based approach to document classification is described in this paper. The graph representation offers the advantage that it allows for a much more expressive document encoding than the more standard bag of words/phrases approach, and consequently gives an improved classification accuracy. Document sets are represented as graph sets to which a weighted graph mining algorithm is applied to extract frequent subgraphs, which are then further processed to produce feature vectors (one per document) for classification. Weighted subgraph mining is used to ensure classification effectiveness and computational efficiency; only the most significant subgraphs are extracted. The approach is validated and evaluated using several popular classification algorithms together with a real world textual data set. The results demonstrate that the approach can outperform existing text classification algorithms on some dataset. When the size of dataset increased, further processing on extracted frequent features is essential.
收起
摘要 :
Abstract Event detection from textual content by using text mining concepts is a well-researched field in the literature. On the other hand, graph modeling and graph embedding techniques in recent years provide an opportunity to r...
展开
Abstract Event detection from textual content by using text mining concepts is a well-researched field in the literature. On the other hand, graph modeling and graph embedding techniques in recent years provide an opportunity to represent textual contents as graphs. Text can be enriched with additional attributes in graphs, and the complex relationships can be captured within the graphs. In this paper, we focus on news prediction and model the problem as subgraph prediction. More specifically, we aim to predict the news skeleton in the form of a subgraph. To this aim, graph-based representations of news articles are constructed and a graph mining based pattern extraction method is proposed. The proposed method consists of three main steps. Initially, graph representation of the news text is constructed. Afterwards, frequent subgraph mining and sequential rule mining algorithms are adapted for pattern prediction on graph sequences. We consider that a subgraph captures the main story of the contents, and the sequential rules indicate the subgraph patterns’ temporal relationships. Finally, extracted sequential patterns are used for predicting the future news skeleton (i.e. main features of the news). In order to measure the similarity, graph embedding techniques are also employed. The proposed method is analyzed on both a collection of news from an online newspaper and on a benchmark news dataset against baseline methods.
收起
摘要 :
Geometric graph mining has been identified as a need in many applications. This technique detects recurrent patterns in data taking into account some geometric distortions. To meet this need, some graph miners have been developed ...
展开
Geometric graph mining has been identified as a need in many applications. This technique detects recurrent patterns in data taking into account some geometric distortions. To meet this need, some graph miners have been developed for detecting frequent geometric subgraphs. However, there are few works that attend to actually apply this kind of pattern as feature for classification tasks. In this paper, a new geometric graph miner and a framework, for using frequent geometric subgraphs in classification, are proposed. Our solution was tested in the already reported AIDS database. The experimentation shows that our proposal gets better results than graph-based classification using non-geometric graph miners.
收起